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The complexity of semiparametric models poses new challenges 
to statistical inference and model selection that frequently arise from 
real applications. In this work, we propose new estimation and vari- 
able selection procedures for the semiparametric varying-coefHcient 
partially linear model. We first study quantile regression estimates 
for the nonparametric varying-coefficient functions and the paramet- 
ric regression coefficients. To achieve nice efficiency properties, we 
further develop a semiparametric composite quantile regression pro- 
cedure. We establish the asymptotic normality of proposed estimators 
for both the parametric and nonparametric parts and show that the 
estimators achieve the best convergence rate. Moreover, we show that 
the proposed method is much more efficient than the least-squares- 
based method for many non-normal errors and that it only loses a 
small amount of efficiency for normal errors. In addition, it is shown 
that the loss in efficiency is at most 11.1% for estimating varying co- 
efficient functions and is no greater than 13.6% for estimating para- 
metric components. To achieve sparsity with high-dimensional covari- 
ates, we propose adaptive penalization methods for variable selection 
in the semiparametric varying-coefficient partially linear model and 
prove that the methods possess the oracle property. Extensive Monte 
Carlo simulation studies are conducted to examine the finite-sample 
performance of the proposed procedures. Finally, we apply the new 
methods to analyze the plasma beta-carotene level data. 
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1. Introduction. Semiparametric regression modeling has recently be- 
come popular in the statistics literature because it keeps the flexibility of 
nonparametric models while maintaining the explanatory power of paramet- 
ric models. The partially linear model, the most commonly used semipara- 
metric regression model, has received a lot of attention in the literature; see 
Hardle, Liang and Gao [9], Yatchew [32] and references therein for theory 
and applications of partially linear models. Various extensions of the par- 
tially linear model have been proposed in the literature; see Ruppert, Wand 
and Carroll [26] for applications and theoretical developments of semipara- 
metric regression models. The semiparametric varying-coefficient partially 
linear model, as an important extension of the partially linear model, is be- 
coming popular in the literature. Let y be a response variable and {U, X, Z} 
its covariates. The semiparametric varying-coefficient partially linear model 
is defined to be 

(1.1) Y = ao{U) + X'^a{U) + Z'^P + e, 

where ao{U) is a baseline function, a{U) = {ai(C/), . . . , (C/)}-^ consists 
of di unknown varying coefficient functions, f3 = . . . , /S^j)"'" is a d2- 
dimensional coefficient vector and e is random error. In this paper, we will 
focus on univariate U only, although the proposed procedure is directly 
applicable for multivariate U. Zhang, Lee and Song [33] proposed an esti- 
mation procedure for the model (1.1), based on local polynomial regression 
techniques. Xia, Zhang and Tong [31] proposed a semilocal estimation pro- 
cedure to further reduce the bias of the estimator for (3 suggested in Zhang, 
Lee and Song [33]. Fan and Huang [5] proposed a profile least-squares es- 
timator for model (1.1) and developed statistical inference procedures. As 
an extension of Fan and Huang [5], a profile likelihood estimation proce- 
dure was developed in Lam and Fan [18], under the generalized linear model 
framework with a diverging number of covariates. 

Existing estimation procedures for model (1.1) were built on either least- 
squares- or likelihood-based methods. Thus, the existing procedures are ex- 
pected to be sensitive to outliers and their efficiency may be significantly 
improved for many commonly used non-normal errors. In this paper, we pro- 
pose new estimation procedures for model (1.1). This paper contains three 
major developments: (a) semiparametric quantile regression; (b) semipara- 
metric composite quantile regression; (c) adaptive penalization methods for 
achieving sparsity in semiparametric composite quantile regression. 

Quantile regression is often considered as an alternative to least-squares 
in the literature. For a complete review on quantile regression, see Koenker 
[17]. Quantile-regression-based inference procedures have been considered in 
the literature; see, for example, Cai and Xu [2], He and Shi [10], He, Zhu 
and Fung [11], Lee [19], among others. In Section 2, we propose a new semi- 
parametric quantile regression procedure for model (1.1). We investigate the 
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sampling properties of the proposed method and their asymptotic normal- 
ity. When applying semiparametric quantile regression to model (1.1), we 
observe that all quantile regression estimators can estimate a(u) and f3 with 
the optimal rate of convergence. This fact motivates us to combine the in- 
formation across multiple quantile estimates to obtain improved estimates 
of a{u) and /3. Such an idea has been studied for the parametric regression 
model in Zou and Yuan [35] and it leads to the composite quantile regression 
(CQR) estimator that is shown to enjoy nice asymptotic efficiency proper- 
ties compared with the classical least-squares estimator. In Section 3, we 
propose the semiparametric composite quantile regression (semi-CQR) es- 
timators for estimating both nonparametric and parametric parts in model 
(1.1). We show that the semi-CQR estimators achieve the best convergence 
rates. We also prove the asymptotic normality of the semi-CQR estimators. 
The asymptotic theory shows that, compared with the semiparametric least- 
squares estimators, the semi-CQR estimators can have substantial efficiency 
gain for many non-normal errors and only lose a small amount of efficiency 
for normal errors. Moreover, the relative efficiency is at least 88.9% for es- 
timating varying-coefficient functions and is at least 86.4% for estimating 
parametric components. 

In practice, there are often many covariates in the parametric part of 
model (1.1). With high-dimensional covariates, sparse modeling is often con- 
sidered superior, owing to enhanced model predictability and interpretability 
[7]. Variable selection for model (1.1) is challenging because it involves both 
nonparametric and parametric parts. Traditional variable selection methods, 
such as stepwise regression or best subset variable selection, do not work ef- 
fectively for the semiparametric model because they need to choose smooth- 
ing parameters for each submodel and cannot cope with high-dimensionality. 
In Section 4, we develop an effective variable selection procedure to select 
significant parametric components in model (1.1). We demonstrate that the 
proposed procedure possesses the oracle property, in the sense of Fan and 
Li [6]. 

In Section 5, we conduct simulation studies to examine the finite-sample 
performance of the proposed procedures. The proposed methods are illus- 
trated with the plasma beta-carotene level data. Regularity conditions and 
technical proofs are given in Section 6. 

2. Semiparametric quantile regression. In this section, we develop the 
semiparametric quantile regression method and theory. Let Prir) = rr — rl 
(r < 0) be the check loss function at r G (0, 1). Quantile regression is often 
used to estimate the conditional quantile functions of Y, 



Qr{u,x,z) = avgmin E{pr{Y — a)|(C/,X,Z) = (u,x,z)}. 

a 
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The semiparametric varying-coefficient partially linear model assumes that 
the conditional quantile function is expressed as Qr{u,x,z) = Q;o,r('u) + 

Suppose that {C/j, Xj, Zj, yj}, i = 1, . . . ,n, is an independent and identi- 
cally distributed sample from the model 

(2.1) y = ao,.(C/)+X^«,(f/) + Z^/3, + e„ 

where Sr is random error with conditional rth quantile being zero. We obtain 
quantile regression estimates of aQ^ri'), f^ri') and (3^ by minimizing the 
quantile loss function 

n 

(2.2) J^Pr{>^ - aoiUi) - Xfa(C/0 - Zf/3}. 

i=l 

Because (2.2) involves both nonparametric and parametric components, and 
because they can be estimated by different rates of convergence, we propose 
a three-stage estimation procedure. In the first stage, we employ local linear 
regression techniques to derive an initial estimates of Q!o,r(")i ^t(") and f3^. 
Then, in the second and third stages, we further improve the estimation 
efficiency of the initial estimates for (3^ and (ao,r(')) '^t('))5 respectively. 
For U in the neighborhood of u, we use a local linear approximation 

aj{U) PS aj{u) + a'j{u){U — u) = aj + bj{U — u) 

for j = 0,...,(ii. Let {ao,T, bo,T; a,-, b,-, /3^} be the minimizer of the local 
weighted quantile loss function 

n 

Y,Pr{y^ - ao - bo{Ui -u)- Xf{a + b(C/, - u)] - Zf(3}Kh{Ui - u), 
1=1 

where a = (ai, . . . , a^j)-^, b = (6i, . . . , bdj)'^ , K{-) is a given kernel function 
and Kh{-) =K{-/h)/h with a bandwidth h. Then, 

ao,T{u)=aQ, ar{u)=a.r. 

We take {ao^r{u) , OLriu) , P^} as the initial estimates. 

We now provide theoretical justifications for the initial estimates. First, we 
give some notation. Let /t-(-|u,x,z) and Ft-{-\u,x,z) be the density function 
and cumulative distribution function of the error conditional on (U, X, Z) = 
(n,x,z), respectively. Denote by fui') the marginal density function of the 
covariate U. The kernel K{-) is chosen as a symmetric density function and 
we let 

We then have the following result. 
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Theorem 2.1. Under the regularity conditions given in Section 7, if 
/i — )• and nh — )■ oo as n —)■ oo, then 



'nh 



(2.3) 



V 



C>Lt(u) — CXr{u) 

i^oT(l-r) 1 




N 0, 



fuiu) 



where Ai(n) = ^[/^(0|[/, X, Z)(l, X^, Z^)^(l, X^, Z^)|[/ = n] and Bi(n) = 
E[(l,X^,Z^)^(l,X^,Z^)|C/ = n]. 

Theorem 2.1 implies that is a \/n7i-consistent estimator — this is be- 
cause we only use data in a local neighborhood of u to estimate /3^. Define 
Y*^ = Yi — aoriUi) — 'KjaT-{Ui) and compute an improved estimator of 

by 



(2.4) 
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i=l 



We call it the semi-QR estimator of (3^. The next theorem shows the asymp- 
totic properties of P^. 

Theorem 2.2. Let $^{u,x,z) = E[fr{0\U,X,Z)Z{l,X^ ,0)\U = u] x 
A^^{u){l,x^ ,z^)'^ . Under the regularity conditions given in Section 7, if 
nh^ — )• and n/i^/log(l//i) — )■ oo as n ^ oo, then the asymptotic distribu- 
tion of (3^ is given by 



(2.5) 



where Sr = E[fr{0\U, X, Z)ZZ^] and = r(l - t)E[{Z - $^{U, X, Z)}{Z - 

The optimal bandwidth in Theorem 2.1 is /i ~ n~^/^. This bandwidth does 
not satisfy the condition in Theorem 2.2. Hence, in order to obtain the root- 
re consistency and asymptotic normality for (3^, undersmoothing for do^riu) 
and a-riu) is necessary. This is a common requirement in semiparametric 
models; see Carroll et al. [3] for a detailed discussion. 

After obtaining the root-n consistent estimator 0^, we can further im- 
prove the efficiency of ao^riu) and ar{u). To this end, let {oo.t, ^o,r; a^, W} 
be the minimizer of 



Y,Pr{y^ - Zf^r "00" bo{Ui - u) - xf {a + b(C/, - u)]]Kh{U^ - u). 
1=1 



6 



B. KAI, R. LI AND H. ZOU 



We define 

(2.6) ao,r('u) =ao,T, Q:^(n)=a^. 

Theorem 2.3. Under the regularity conditions given in Section 7, if 

/i — >• and nh — t- oo as oo, then 



'nh 



(2.7) 



dLT-{u) — cxr{u) J 2 \ a'^{u) 



where A2(u) = E[/^(0|[/, X, Z)(l, X^)^(l, X^)|C/ = u] and B2{u) = E[{1, 
X^f{l,X^)\U = u]. 

Theorem 2.3 shows that aQ^r{u) and a.T-{u) have the same conditional 
asymptotic biases as aQ^r{u) and otriu), while they have smaller conditional 
asymptotic variances than do,r(^) and a-riu), respectively. Hence, they are 
asymptotically more efficient than ao^riu) and q.t-{u). 

3. Semiparametric composite quantile regression. The analysis of semi- 
parametric quantile regression in Section 2 provides a solid foundation for de- 
veloping the semiparametric composite quantile regression (CQR) estimates. 
We consider the connection between the quantile regression model (2.1) and 
model (1.1) in the situations where the random error e is independent of 
([/, X,Z). Let us assume that Y = ao{U) + X.'^a{U) + + e, where e 
follows a distribution F with mean zero. In such situations, Qr{u,x,z) = 
ao{u) + Cr + x'^Q;(ti) + z^P, where Cr = F~^{t). Thus, all quantile regres- 
sion estimates [a-j-iu) and f3^ for all r] estimate the same target quantities 
[q.{u) and f3] with the optimal rate of convergence. Therefore, we can con- 
sider combining the information across multiple quantile estimates to obtain 
improved estimates of a{u) and /3. Such an idea has been studied for the 
parametric regression model, in Zou and Yuan [35], and it leads to the 
CQR estimator that is shown to enjoy nice asymptotic efficiency proper- 
ties compared with the classical least-squares estimator. Kai, Li and Zou 
[13] proposed the local polynomial CQR estimator for estimating the non- 
parametric regression function and its derivative. It is shown that the local 
CQR method can significantly improve the estimation efficiency of the local 
least-squares estimator for commonly used non-normal error distributions. 
Inspired by these nice results, we study semiparametric CQR estimates for 
model (1.1). 

Suppose {C/j, Xj, Zj, li, I = 1, . . . , n} is an independent and identically dis- 
tributed sample from model (1.1) and e has mean zero. For a given q, let 
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Tk = k/{q + 1) for A; = 1, 2, ... , q. The CQR procedure estimates ao{-), a{-) 
and P by minimizing the CQR loss function, 

q n 
k=l i=l 

To this end, we adapt the three-stage estimation procedure from Section 2. 

First, we derive good initial semi-CQR estimates. Let {ao, 6o, a, b, /3} be 
the minimizer of the local CQR loss function 

q n 

E E P-'^^^^ - ""Ok - bo{U, - n) - Xf {a + h{U, - u)} - Zjp}Kh{U, - u), 

k=l i=l 

where ao = (oqi, • • ■,aog)'^, a = (oi, . . ..adj'^ and b = . . . Initial 
estimates of ao(^) and ol{u) are then given by 

I 1 

(3.1) do(n) = -^aofc, a{u) = a. 

^ k=i 

To investigate asymptotic behaviors of ao{u), a{u) and /3, let us begin 
with some new notation. Denote by /(•) and F{-) the density function and 
cumulative distribution function of the error, respectively. Let Ck = F~^{Tk) 
and C be a g X g diagonal matrix with Cjj = f{cj). Write c = CI, c = l^Cl 
and 

C cX^ cZ^ 
Xc^ cXX^ cXZ^ 
Zc^ cZX^ cZZ^ 

Let Tkk' =Tk/\ Tk' — TkTk' and let T be a (7 x g matrix with the {k, k') element 
being Tkk'- Write t = Tl, t = l^Tl and 

T tX^ tZ^ 
Xt^ tXX^ tXZ^ 
Zt^ tZX^ tZZ^ 



U = u 



U = u 



The following theorem describes the asymptotic sampling distribution of 
{ao,6o,a,b,/3}. 

Theorem 3.1. Under the regularity conditions given in Section 7, if 
/i — )• and nh — )■ 00 as 00, then 



nh 



V 



'ao - ao{u) 
a — a{u) 

^0 
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where a.Q{u) = {ao{u) + ci, . . . , ao{u) + Cq)'^ and Pq is the true value of (3. 

With the initial estimates in hand, we are now ready to derive a y/n- 
consistent estimator of /3 by 

q n 

(3.2) ^ = argmin^ J^p^jy^ - aofc(i7,) - Xfa(i7i) - Zf/3}, 

k=i i=i 

which is called the semi-CQR estimator of (3. 

Theorem 3.2. Under the regularity conditions given in Section 7, if 
nh'^ —7- and nh^ /\og{l/h) — t- oo as oo, then the asymptotic distribution 
of (3 is given by 



(3.3) 



where S = E{ZZ^) and A = ^Li El'=i rkk'E[{Z- 6k{U, X, Z)}{Z - dk'{U, X, 
Z)}-^], with (5fc(n,x, z) being the kth column of the d2 x q matrix 



S{u,x, z) = £;[Z(c^,cX^, 0)\U = u]'D^\u){Ig, l^x, l^z^^ 

Finally, /3 can also be used to further refine the estimates for the non- 
parametric part. Let {ao,6o)^)b} be the minimizer of 



q n 



E P-^ - 2^-^ - "C"^ - ^o(f^* - - {a + h{Ui - u)]\Kh{U, - u) 



k=l i=l 



where ag = (oqi, . . ■ , aog)"^- We then define the semi-CQR estimators for 
ao(u) and cil{u) as 



u =a. 



(3.4) ao(n) = -^aofc, «( 

^ fc=i 

We now study the asymptotic properties of ao{u) and c>l{u). Let 
T>2{u) = E 



c 


cX^ 


Xc^ 


cXX^ 


T 


tx^ 


Xt^ 


txx^ 



U = u 
U = u 



Theorem 3.3. Under the regularity conditions given in Section 7, if 
/i — )• and n/i — )• oo as n ^ oo, the asymptotic distributions of do('u) and 
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a{u) are given by 



( 1 " 
^nhi ao{u) — ao{u) 

\ ^ k=i 



^'iv(o,-^ll^[D2^(..)I]2(n)D2-i(n)]nl) 



and 



where denotes the upper-left qx q suhmatrix and [•]22 denotes the lower- 
right di X di suhmatrix. 

Remark 1. cx{u) and (3 represent the contributions of covariates. They 
are the central quantities of interest in semiparametric inference. Li and 
Liang [21] studied the least-squares-based semiparametric estimation, which 
we will refer to as "semi-LS" in this work. The major advantage of semi-CQR 
over the classical semi-LS is that semi-CQR has competitive asymptotic ef- 
ficiency. Furthermore, semi-CQR is also more stable and robust. Intuitively 
speaking, these advantages come from the fact that semi-CQR utilizes in- 
formation shared across multiple quantile functions, whereas semi-LS only 
uses the information contained in the mean function. 

To elaborate on Remark 1, we discuss the relative efficiency of semi- 
CQR relative to semi-LS. Note that E{Y\U) = ao{U) + E{X.\U)^ a{U) + 
E{Z\UYl3. It then follows thaXY = E{Y\U) + {1^- E{X\U)Y (y.{U) + {Z- 
E{7i\U)}'^ fi + e. Without loss of generality, let us consider the situation 
in which E{'K\U) = and E{Z\U) =0. Then, aU Di(ii),D2(u),S;i('u) and 
S2(^*) become block diagonal matrices. Thus, from Theorem 3.3, we have 



and 



^nh{ a{u)-a{u)-^a"{u)^ N (^0, R2{q)j^E~\XX.^\U = u) ] , 



where 

q q 



Tkk' 
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and 



R2{q) = - 




Note that 



/ XX'^ XZ^ 
V ZX^ ZZ^ 



) 



1 -1 



S{u,^,z) = E{ZX\0\U = u)E 



U 



u 



(l^x,l^z)^ 



with all columns of S{u,x,z) the same. Thus, A = tA.Q with Aq = E[{Zi — 
Si {U, X, Z)}{Z - Si {U, X, Z)}^] . It is easy to show that E{Si ([/, X, Z)Z^} = 
and we then have 

Ao = ^[^(ZZ^|;7) 

X {^(zz^|c/) - £;(zx^|[/)£;(xx'^|c/)-^s(xz^|c/)}-^£;(zz'^|c/)]. 

Therefore, 



If we replace i?2(9) with 1 in equations (23) and (24), we end up with 
the asymptotic normal distributions of the semi-LS estimators, as studied in 
Li and Liang [21]. Thus, R2{q) determines the asymptotic relative efficiency 
(ARE) of semi-CQR relative to semi-LS. By direct calculations, we see that 
the ARE for estimating a{u) is R2{q)~^^^ and the ARE for estimating (3 is 
R2{q)~^- It is interesting to see that the same factor, R2iq), also appears in 
the asymptotic efficiency analysis of parametric CQR [35] and nonparametric 
local CQR smoothing [13]. The basic message is that, with a relatively large 
q ((Z > 9), R2{q) is very close to 1 for the normal errors, but can be much 
smaller than 1, meaning a huge gain in efficiency, for the commonly seen 
non-normal errors. It is also shown in [13] that limg_>oo R2{q)~^ ^ 0.864 and 
hence limg_>.oo R2iq)~^^^ ^ 0.8896, which implies that when a large q is used, 
the ARE is at least 88.9% for estimating varying-coefficient functions and 
at least 86.4% for estimating parametric components. 

Remark 2. The baseline function estimator ao{u) converges to ctoiu) 
plus the average of uniform quantiles of the error distribution. Therefore, 
the bias term is zero when the error distribution is symmetric. Even for 
nonasymmetric distributions, the additional bias term converge to the mean 
of the error, which is zero for a large value of q. Nevertheless, its asymptotic 
variance differs from that of the semi-LS estimator by a factor of Ri (q) . The 
study in Kai, Li and Zou [13] shows that Ri{q) approaches 1 as g becomes 
large and Ri{q) could be much smaller than 1 with a smaller q {q <9) for 
commonly used non-normal distributions. 



(3.5) 




V 
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Remark 3. The factors Ri{q) and R2{q) only depend on the error dis- 
tribution. We have observed from our simulation study that, as a function 
of g, the maximum of R2{q) is often closely approximated by R2 ((7 = 9). 
Hence, if we only care about the inference of cx{u) and /3, then q = 9 seems 
to be a good default value. On the other hand, i?i [q = 5) is often close 
to the maximum of Ri{q) based on our numerical study and hence g = 5 
is a good default value for estimating the baseline function. If prediction 
accuracy is the primary interest, then we should use a proper q to maximize 
the total contributions from Ri{q) and R2{q)- Practically speaking, one can 
choose a q from the interval [5, 9] by some popular tuning methods such as 
-fC-fold cross-validation. However, we do not expect these CQR models to 
have significant differences in terms of model fitting and prediction because, 
in many cases, Ri{q) and R2{q) vary little in the interval [5,9]. 

4. Variable selection. Variable selection is a crucial step in high-dimensional 
modeling. Various powerful penalization methods have been developed for 
variable selection in parametric models; see Fan and Li [7] for a good review. 
In the literature, there are only a few papers on variable selection in semi- 
parametric regression models. Li and Liang [21] proposed the nonconcave 
penalized quasi-likelihood method for variable selection in semiparametric 
varying-coefficient models. In this section, we study the penalized semipara- 
metric CQR estimator. 

Let p\„{-) be a pre-specified penalty function with regularization param- 
eter Xn- We consider the penalized CQR loss 

q n 6.2 
k=l i=l j=l 

By minimizing the above objective function with a proper penalty parameter 
Xn, we can get a sparse estimator of f3 and hence conduct variable selection. 

Fan and Li [6] suggested using a concave penalty function since it is able 
to produce an oracular estimator, that is, the penalized estimator performs 
as well as if the subset model were known in advance. However, optimiz- 
ing (4.1) with a concave penalty function is very challenging because the 
objective function is nonconvex and both loss and penalty parts are non- 
differentiable. Various numerical algorithms have been proposed to address 
the computational difficulties. Fan and Li [6] suggested using local quadratic 
approximation (LQA) to substitute for the penalty function and then op- 
timizing using the Newton-Raphson algorithm. Hunter and Li [12] further 
proposed a perturbed version of LQA to alleviate one drawback of LQA. 
Recently, Zou and Li [34] proposed a new unified algorithm based on local 
linear approximation (LLA) and further suggested using the one-step LLA 
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estimator because the one-step LLA automatically adopts a sparse repre- 
sentation and is as efficient as the fully iterated LLA estimator. Thus, the 
one-step LLA estimator is computationally and statistically efficient. 

We proposed to follow the one-step sparse estimate scheme in Zou and Li 
[34] to derive a one-step sparse semi-CQR estimator, as follows. First, we 
compute the unpenalized semi-CQR estimate as described in Section 
3. We then define 

q n d2 
k=l 1=1 j=l 

We define ^^^^ = argmin^ and call this the one-step sparse semi- 

CQR estimator. Indeed, this is a weighted Li regularization procedure. 

We now show that the one-step sparse semi-CQR estimator ^^^^ enjoys 
the oracle property. This property holds for a wide class of concave penalties. 
To establish the idea, we focus on the SCAD penalty from Fan and Li [6], 
which is perhaps the most popular concave penalty in the literature. Let 
/3o = (/3iO'/^2o)^ denote the true value of f3, where Piq is a s- vector. Without 
loss of generality, we assume that P20 ~ ^ that /3^q contains all nonzero 
components of P^. Furthermore, let Zi be the first s elements of Z and 
define 

A(^/,x,z) =£;[Zi(c^,cX^,0)|C/ = 'u]D^^(n)(/g,l^x,l^z)^. 

Theorem 4.1 (Oracle property). Let px{-) he the SCAD penalty. As- 
sume that the regularity conditions (B1)-(B6) given in the Appendix hold. If 
^/nXn — )• 00, \n " ^ 0, n/i^ — )• and n/i^/log(l//i) — )• (X) as n ^ oo, then the 
one-step semi-CQR estimator /S*^^^ must satisfy: 

(a) sparsity, that is, fS^^^ = 0, with probability tending to one; 

(b) asymptotic normality, that is, 

(4.2) ^A^(^?^^ - /3io) ^ N (^0, Isr^ASr^) , 

where Si = E(ZiZf) and A = ^ki " Afe(^7, X, Z)}{Zi - 

Afe/(C/, X, Z)}"^] with Xk{u,:si.,z) being the kth column of the matrix X{u,:x.,z) . 

Theorem 4.1 shows the asymptotic magnitude of the optimal A„. For a 
given data set with finite sample, it is practically important to have a data- 
driven method to select a good A^- Various techniques have been proposed 
in previous studies, such as the generalized cross-validation selector [6] and 
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the BIC selector [27]. In this work, we use a BIC-hke criterion to select the 
penalization parameter. The BIC criterion is defined as 

q n \ 




n 

where dfx is the number of nonzero coefficients in the parametric part of 
the fitted model. We let Abic = argminBIC(A). The performance of Abic 
will be examined in our simulation studies in the next section. 

Remark 4. Variable selection in linear quantile regression has been con- 
sidered in several papers; see Li and Zhu [22] and Wu and Liu [30]. The 
developed method for sparse semiparametric CQR can be easily adopted 
for variable selection in semiparametric quantile regression. Consider the 
penalized check loss 

n d2 

(4.3) - ao,rm - Xf - Zfp} + nJ2PxM\)- 

i=i j=i 

For its one-step version, we use 

n d2 

(4.4) Y.Pr{Y^ - «o,r(f/0 - Xf «,([/,) - Zf/3} + nJ2PxA\l^fmi 

i=l j=l 

where P^^^ denotes the unpenalized semiparametric quantile regression es- 
timator defined in Section 2. We can also prove the oracle property of the 
one-step sparse semiparametric quantile regression estimator by following 
the lines of proof for Theorem 4.1. For reasons of brevity, we omit the de- 
tails here. 

5. Numerical studies. In this section, we conduct simulation studies to 
assess the finite-sample performance of the proposed procedures and illus- 
trate the proposed methodology on a real-world data set in a health study. In 
all examples, we fix the kernel function to be the Epanechnikov kernel, that 
is, K{u) = |(1 — u'^)+, and we use the SCAD penalty function for variable 
selection. Note that all proposed estimators, including semi-QR, semi-CQR 
and one-step sparse semi-CQR, can be formulated as linear programming 
(LP) problems. In our study, we solved these estimators by using LP tools. 

Example 1. In this example, we generate 400 random samples, each 
consisting of n = 200 observations, from the model 

Y = ai{U)Xi + a2{U)X2 + + P2Z2 + + e. 
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where ai{U) = sm(67rC/), a2{U) = sin(27rC/), /3i = 2, /32 = 1 and /Js = 0.5. 
The covariate U is from the uniform distribution on [0, 1] . The covari- 
ates Xi,X2,Zi,Z2 are jointly normahy distributed with mean 0, variance 
1 and correlation 2/3. The covariate Z3 is Bernoulli with Pr(Z3 = 1) = 0.4. 
Furthermore, U and {Xi,X2, Zi, Z2, Z3) are independent. In our simula- 
tion, we considered the following error distributions: A^(0, 1), logistic, stan- 
dard Cauchy, t-distribution with 3 degrees of freedom, mixture of normals 
0.9A^(0, 1) + 0.1A^(0, 10^) and log- normal distribution. Because the error is 
independent of the covariates, the least-squares (LS), quantile regression 
(QR) and composite quantile regression (CQR) procedures provide estimates 
for the same quantity and hence are directly comparable. 

Performance of f3^ and (3. To examine the performance of the proposed 
procedures with a wide range of bandwidths, three bandwidths for LS were 
considered, /i = 0.085, 0.128, 0.192, which correspond to the undersmooth- 
ing, appropriate smoothing and oversmoothing, respectively. By straight- 
forward calculation, as in Kai, Li and Zou [13], we can produce two sim- 
ple formulas for the asymptotic optimal bandwidths for QR and CQR: 



hcQR = hLS ■ R2{qY/^ and /iqr,, = /ils • - t) / f[F-\T)]Yl\ where /ils 



is the asymptotic optimal bandwidth for LS. We considered only the case of 
normal error. The bias and standard deviation based on 400 simulations are 
reported in Table 1. First, we see that the estimators are not very sensitive to 
the choice of bandwidth. As for the estimation accuracy, all three estimators 
have comparable bias and the differences are shown in standard deviation. 
The LS estimates have the smallest standard deviation, as expected. The 
CQR estimates are slightly worse than the LS estimates. 

In the second study, we fixed h = 0.128 and compared the efficiency of QR 
and CQR relative to LS. Reported in Table 2 are RMSEs, the ratios of the 
MSEs of the QR and CQR estimators to the LS estimator for different error 
distributions. Several observations can be made from Table 2. When the 
error follows the normal distribution, the RMSEs of CQR are slightly less 
than 1. For all other non-normal distributions in the table, the RMSE can be 
much greater than 1, indicating a huge gain in efficiency. These findings agree 
with the asymptotic theory. For QR estimators, their performance varies and 
depends heavily on the level of quantile and the error distribution. Overall, 
CQR outperforms both QR and LS. 

Performance of otr and a. We now compare the LS, QR and CQR esti- 
mates for a by using the ratio of average squared errors (RASE). We first 
compute 
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Table 1 

Summary of the bias and standard deviation over 400 simulations 



Bias(SD) 



h 


Method 


/3 


1 




/3 


2 






/3 


3 




0.085 


LSE 


-0.012 


(0 


128) 


0.008 


(0 


121) 


-0 


009 


(0 


171) 




CQRg 


-0.009 


(0 


131) 


0.009 


(0 


125) 


-0 


007 


(0 


172) 




QRo.25 


-0.017 


(0 


163) 


0.009 


(0 


161) 


-0 


151 


(0 


237) 




QRo.50 


-0.012 


(0 


155) 


0.011 


(0 


151) 


-0 


007 


(0 


198) 




QRo.75 


-0.007 


(0 


165) 


0.005 


(0 


158) 





122 


(0 


216) 


0.128 


LSE 


-0.009 


(0 


121) 


0.005 


(0 


117) 


-0 


008 


(0 


164) 




CQRg 


-0.010 


(0 


127) 


0.008 


(0 


121) 


-0 


005 


(0 


163) 




QRo.25 


-0.010 


(0 


159) 


0.003 


(0 


152) 


-0 


082 


(0 


227) 




QRo.50 


-0.008 


(0 


154) 


0.011 


(0 


147) 


-0 


004 


(0 


193) 




QRo.75 


-0.012 


(0 


163) 


0.003 


(0 


161) 





071 


(0 


207) 


0.192 


LSE 


-0.007 


(0 


128) 


0.001 


(0 


123) 


-0 


008 


(0 


169) 




CQR9 


-0.009 


(0 


131) 


0.005 


(0 


127) 


-0 


005 


(0 


169) 




QRo.25 


-0.006 


(0 


169) 


-0.004 


(0 


169) 


-0 


061 


(0 


230) 




QRo.50 


-0.005 


(0 


153) 


0.006 


(0 


152) 


-0 


007 


(0 


191) 




QRo.75 


-0.012 


(0 


170) 


0.007 


(0 


171) 





049 


(0 


225) 



where {li^ : k = 1, . . . , ng^d} is a set of grid points uniformly placed on [0, 1] 
with Kgi-itj = 200. RASE is then defined to be 

for an estimator g, where ^ls is the least-squares-based estimator. 

The sample mean and standard deviation of the RASEs over 400 sim- 
ulations are presented in Table 3, where the values in the parentheses are 
the standard deviations. The findings are quite similar to those in Table 2. 
We see that CQR performs almost as well as LS when the error is normally 
distributed. Also, its RASEs are much larger than 1 for other non-normal 
error distributions. The efficiency gain can be substantial. Note that for the 
Cauchy distribution, RASEs of QR and CQR are huge — this is because LS 
fails when the error variance is infinite. 

Example 2. The goal is to compare the proposed one-step sparse semi- 
CQR estimator with the one-step sparse semi-LS estimator. In this example, 
400 random samples, each consisting of n = 200 observations, were generated 
from the varying-coefficient partially linear model 

Y = ai{U)Xi + a2{U)X2 + /3^Z + e, 

where /3 = (3, 1.5, 0, 0, 2, 0, 0, 0)-^ and the covariate vector (Xi,X2,Z^)^ is 
normally distributed with mean 0, variance 1 and correlation 0.5^^~^^ = 
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Table 2 

Summary of the ratio of MSE over 400 simulations 



RMSE 



Method 


01 


02 


03 


Standard normal 






CQRij 


0.920 


0.932 


1.011 


QRo.25 


0.585 


0.594 


0.460 


QRo.50 


0.621 


0.631 


0.724 


QRo.75 


0.554 






Logistic 








CQRg 


1.044 


1.083 


1.016 


QRo.25 


0.651 


0.664 


0.502 


QRo.50 


0.826 


0.871 


0.799 


QRo.75 


0.661 


U. / oZ 


U.oz / 


Standard Cauchy 






CQRg 


15,246 


106,710 


52,544 


QRo.25 


8894 


56,704 


24,359 


QRo.50 


19,556 


137 109 


66 560 


QRo.75 


8223 


62,282 


26,210 


f-distribution with df = 3 






CQRg 


1.554 


1.546 


1.683 


QRo.25 


1.000 


0.948 


0.819 


QRo.5o 


1.354 


1.333 


1.451 


QRo.75 


0.935 


1.059 


0.859 


0.9iV(0,l) +0. 


,liV(0, 10^) 






CQRg 


5.752 


4.860 


5.152 


QRo.25 


3.239 


3.096 


2.300 


QRo.5o 


5.430 


4.730 


4.994 


QRo.75 


3.790 


2.952 


2.515 


Log-normal 








CQRg 


3.079 


3.369 


3.732 


QRo.25 


5.198 


5.361 


3.006 


QRo.5o 


2.787 


2.829 


3.139 


QRo.75 


0.819 


0.868 


0.823 



Table 3 

Summary of the RASE over 400 simulations 



Normal Logistic Cauchy ts Mixture Log-normal 



CQRg 0.968 (0.104) 1.040 (0.134) 12,872 (176719) 1.428 (1.299)3.292 (1.405)2.455 (1.498) 

QRo.25 0.666 (0.160)0.720 (0.203) 7621 (110692)0.958 (0.647)2.029 (1.003)3.490 (3.224) 

QRo.50 0.771 (0.184)0.881 (0.206) 13,720 (187298) 1.274 (1.166)3.155 (1.323)2.155 (1.674) 

QRo.75 0.681 (0.191)0.713 (0.201) 5781 (87909) 0.896 (0.325) 1.953 (0.905)0.824 (0.679) 
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1, . . . , 10). Other model settings are exactly the same as those in Example 1. 
We use the generalized mean square error (GMSE), as defined in [21], 

(5.2) GMSE0) = 0- pfE{ZZ'^)0- p), 

to assess the performance of variable selection procedures for the parametric 
component. For each procedure, we calculate the relative GMSE (RGMSE), 
which is defined to be the ratio of the GMSE of a selected final model to 
that of the unpenalized least-squares estimate under the full model. 

The results over 400 simulations are summarized in Table 4, where the 
column "RGMSE" reports both the median and MAD of 400 RGMSEs. 
Both columns "C" and "IC" are measures of model complexity. Column 
"C" shows the average number of zero coefficients correctly estimated to be 
zero and column "IC" presents the average number of nonzero coefficients 
incorrectly estimated to be zero. In the column labeled "U-fit" (short for 
"under-fit"), we present the proportion of trials excluding any nonzero co- 
efficients in 400 replications. Likewise, we report the probability of trials 
selecting the exact subset model and the probability of trials including all 
three significant variables and some noise variables in the columns "C-fit" 
("correct-fit") and "0-fit" ("over-fit"), respectively. From Table 4, we see 



Table 4 

One-step estimates for variable selection in semiparametric models 



Method 


RGMSE 
Median (MAD) 


No. of zeros 


Proportion of fits 


C 


IC 


U-fit 


c-fit 


0-flt 


Standard normal 
















One-step LS 


0.335 


(0.194) 


4.825 


0.000 


0.000 


0.867 


0.133 


One-step CQR 


0.288 


(0.213) 


4.990 


0.000 


0.000 


0.990 


0.010 


Logistic 
















One-step LS 


0.352 


(0.197) 


4.805 


0.000 


0.000 


0.870 


0.130 


One-step CQR 


0.289 


(0.206) 


4.975 


0.000 


0.000 


0.975 


0.025 


Standard Cauchy 
















One-step LS 


0.956 


(0.249) 


2.920 


0.795 


0.595 


0.108 


0.297 


One-step CQR 


0.005 


(0.021) 


5.000 


0.295 


0.210 


0.790 


0.000 


t-distribution with df 


= 3 














One-step LS 


0.346 


(0.179) 


4.803 


0.000 


0.000 


0.860 


0.140 


One-step CQR 


0.183 


(0.177) 


4.987 


0.000 


0.000 


0.988 


0.013 


0.9iV(0,l) -H0.1iV(0,10^) 














One-step LS 


0.331 


(0.190) 


4.848 


0.000 


0.000 


0.883 


0.117 


One-step CQR 


0.060 


(0.083) 


4.997 


0.000 


0.000 


0.998 


0.003 


Log-normal 
















One-step LS 


0.303 


(0.182) 


4.845 


0.000 


0.000 


0.887 


0.113 


One-step CQR 


0.111 


(0.118) 


4.990 


0.000 


0.000 


0.990 


0.010 
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that both variable selection procedures dramatically reduce model errors, 
which clearly show the virtue of variable selection. Second, the one-step 
CQR performs better than the one-step LS in terms of all of the criteria: 
RGMSE, number of zeros and proportion of fits, and for all of the error 
distributions in Table 4. It is also interesting to see that in the normal error 
case, the one-step CQR seems to perform no worse than the one-step LS 
(or even slightly better). We performed the Mann- Whitney test to compare 
their RGMSEs and the corresponding p-value is 0.0495. This observation 
appears to be contradictory to the asymptotic theory. However, this "con- 
tradiction" can be explained by observing that the one-step CQR has better 
variable selection performance. Note that the one-step CQR has significantly 
higher probability of correct-select than the one-step LS, which also tends 
to overselect. Thus, the one-step LS needs to estimate a larger model than 
the truth, compared to the one-step CQR. 

Example 3. As an illustration, we apply the proposed procedures to an- 
alyze the plasma beta-carotene level data set collected by a cross-sectional 
study [24]. This data set consists of 273 samples. Of interest are the relation- 
ships between the plasma beta-carotene level and the following covariates: 
age, smoking status, quetelet index (BMI), vitamin use, number of calories, 
grams of fat, grams of fiber, number of alcoholic drinks, cholesterol and di- 
etary beta-carotene. The complete description of the data can be found in 
the StatLib database via the link lib.stat.cmu.edu/datasets/Plasma_Retinol. 

We fit the data by using a partially linear model with U being ^^dietary 
beta- carotene." The covariates ^^smoking status" and ^^vitamin use" are cat- 
egorical and are thus replaced with dummy variables. All of the other co- 
variates are standardized. We applied the one-step sparse CQR and LS esti- 
mators to fit the partially linear regression model. Five-fold cross-validation 
was used to select the bandwidths for LS and CQR. We used the first 200 
observations as a training data set to fit the model and to select significant 
variables, then used the remaining 73 observations to evaluate the predictive 
ability of the selected model. 

The prediction performance is measured by the median absolute predic- 
tion error (MAPE), which is the median of {\yi — yi\,i = 1,2,. ..,73}. To 
see the effect of q on the CQR estimate, we tried q = 5,7,9. We found that 
the selected Z-variables are the same for these three values of q and their 
MAPEs are 58.52, 58.11, 62.43, respectively. Thus, the effect of q is minor. 
The resulting model with g = 7 is given in Table 5 and the estimated inter- 
cept function is depicted in Figure 1. From Table 5, it can be seen that the 
CQR model is much sparser than the LS model. Only two covariates, fiber 
consumption per day" and fairly often use of vitamin" are included in the 
parametric part of the CQR model. Meanwhile, the CQR model has much 
better prediction performance than the LS model, whose MAPE is 111.28. 
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Table 5 

Selected parametric components for plasma beta-carotene level data 





aOSE 

Pls 


30SE 
PcQR 


Age 








Quetelet index 








Calories 


-100.47 





Fat 


52.60 





Fiber 


87.51 


29.89 


Alcohol 


44.61 





Cholesterol 








Smoking status (never) 


51.71 





Smoking status (former) 


72.48 





Vitamin use (yes, fairly often) 


130.39 


30.21 


Vitamin use (yes, not often) 








MAPE 


111.28 


58.11 



6. Discussion. We discuss some directions in which this work could be 
further extended. We have focused on using uniform weights in composite 
quantile regression. In theory, we can use nonuniform weights, which may 
provide an even more efficient estimator when a reliable estimate of the error 
distribution is available. Koenker [16] discussed the theoretically optimal 
weights. Bradic, Fan and Wang [1] suggested a data-driven weighted CQR 
for parametric linear regression, in which the weights mimic the optimal 



(a) ao (u) by LS 



(b) ao(«) by CQR 



200 



150 



100 




200 



150 



100 



0.2 0.4 0.6 

Dietary Beta-carotene 




0.2 0.4 0.6 

Dietary Beta-carotene 



Fig. 1. Plot of estimated intercept function of dietary beta- carotene: (a) the estimated 
intercept function by LS method; (b) the estimated intercept function by CQR method with 
q = 7. 
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weights. The idea in Bradic, Fan and Wang [1] can be easily extended to the 
semi-CQR estimator, which will be investigated in detail in a future paper. 

Penalized Wilcoxon rank regression has been considered independently in 
Leng [20] and Wang and Li [29] and found to achieve a similar efficiency prop- 
erty of CQR for variable selection in parametric linear regression. We could 
also generalize rank regression to handle semiparametric varying-coefficient 
partially linear models. In a working paper, we show that rank regression is 
exactly equivalent to CQR using q = n — 1 quantiles with uniform weights. 
This result indicates that CQR is more flexible than rank regression be- 
cause we can easily use flexible nonuniform weights in CQR to further im- 
prove efficiency, as in Bradic, Fan and Wang [1]. Obviously, CQR is also 
computationally more efficient than rank regression. We note that in para- 
metric linear regression models, rank regression has no efficiency gain over 
least-squares for estimating the intercept. This result is expected to hold for 
estimating the baseline function in the semiparametric varying-coefficient 
partially linear model. 

When the number of varying coefficient components is large, it is also 
desirable to consider selecting a few important components. This problem 
was studied in Wang and Xia [28], where a LASSO-type penalized local 
least-squares estimator was proposed. It would be interesting to apply CQR 
to their method to further improve the estimation efficiency. 

7. Proofs. To establish the asymptotic properties of the proposed esti- 
mators, the following regularity conditions are imposed: 

(CI) the random variable U has bounded support and its density func- 
tion fui') is positive and has a continuous second derivative; 

(C2) the varying coefficients ao(-) and a{-) have continuous second deriva- 
tives in li G Q; 

(C3) K(-) is a symmetric density function with bounded support and 
satisfies a Lipschitz condition; 

(C4) the random vector Z has bounded support; 
(C5) for the semi-QR procedure, 

(i) Ft-{0\u,x,z) = t for all (ii,x,z), and x, z) is bounded away 
from zero and has a continuous and uniformly bounded derivative; 

(ii) Ai(ii) defined in Theorem 2.1 and A2(n) defined in Theorem 2.3 
are nonsingular for all n G f]; 

(C6) for the semi-CQR procedure, 

(i) /(•) is bounded away from zero and has a continuous and uniformly 
bounded derivative; 

(ii) Di(n) defined in Theorem 3.1 and D2(ii) defined in Theorem 3.3 
are nonsingular for all u G 0. 



SPARSE SEMIPARAMETRIC VARYING-COEFFICIENT PLMS 



21 



Although the proposed semi-QR and semi-CQR procedures require differ- 
ent regularity conditions, the proofs follow similar strategies. For brevity, we 
only present the detailed proofs for the semi-CQR procedure. The detailed 
proofs for the semi-QR procedure was given in the earlier version of this 
paper. Lemma 7.1 below, which is a direct result of Mack and Silverman 
[23], will be used repeatedly in our proofs. Throughout the proofs, identities 
of the form G{u) = Op{an) always stand for sup^g^ |G'(n)| = Op{an)- 



Lemma 7.1. Let (Xi, Yi), . . . , (X„,y"„) be i.i.d. random vectors, where 
the Yi 's are scalar random variables. Assume, further, that E\Y\'^ < oo and 
that supx / |?/|^/(x, y) < oo, where f denotes the joint density of (X,y). 
Let K be a bounded positive function with bounded support, satisfying a 
Lipschitz condition. Then, 



sup 



n 



i=l 



Or. 



logl/2(l//i) 



'nh 



provided that n /i — t- oo for some e < 1 — r . 



Let rji^k = < Ck) - Tk and r/*^('u) = /{ej < cu - ri{u)] - Tk, where 
ri{u) = ao{Ui) - Qo(n) - aQ{u){Ui - u) + Xj{cx{Ui) - cx{u) - cx'{u){Ui - 
u)}. Furthermore, let 9* = Vnh{aQi — ao(ii) — ci, . . . , ao<j — olq{u) — Cq, {a — 
a{u)}^,0 - /3o}^, h{bo - a'o{u)},h{h - a'(tx)}^}^ and X*,(n) = {e^, Xf , 
Zf,{Ui — u)/h,'Kj {Ui — u)/h}'^, where e^. is a g-vector with 1 at the kth 
position and elsewhere. 

In the proof of Theorem 3.1, we will first show the following asymptotic 
representation of {ao, 6oi a, b, /3}: 

(7.1) e* = -f[j\u){S*{u)r'WUu) + Opih^ + log'/\l/h)/V^), 
where S*(n) = diag{Di(n), c/.f2B2(u)} and 

q n 

k=i i=i 

The asymptotic normality of {aQ,bo,a,h, (3} then follows by demonstrating 
the asymptotic normality of W* (n). 

Proof of Theorem 3.1. Recall that {ao,a,P,bo,h} minimizes 

q n 

J2 Yl P-^ - "O'^ - ^o(f^. - ^) - {a + h{U, -u)]- Zjf5\Kh{U, - u). 

k=l 1=1 
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We write Yi - a^k - boiUi - n) - X/ {a+ b([/, -u)}- Zj f3 = {£i-Ck)+ri (u) - 
Aj^fc, where Aj^^ = {X*^(n)}^0*/\/n/i. Then, 0* is also the minimizer of 

q n 

L*n{0*) = ^^Ki{u)[pr^{{ei -Ck) + ri{u) - A^^fc} - PrJ(ei - Cfc) + ri(n)}], 
fc=i 1=1 

where Ki[u) = K{{Ui — u)/h}. By applying the identity [14] 

(7.2) Prix - y) - Prix) = y{I{x < 0) - r} + /^{/(x < - /(x < 0)} dz, 







we have 

q n 



LniO*) = ^ Cfc - n{u)] - Tk] 

k=l 1=1 ^ 



+ / <Ck- ri{u) +z} - I{ei < Ck - ri{u)}] dz 



= {w;(n)fr + ^i?;,(0*), 

k=l 

where 



B*nk{n = y,Ki{u) / ' [I{ei<Ck-n{u) + z}-I{e,<Ck-ri{u)}]dz. 
^=l 

Since i?* ki^*) is a summation of i.i.d. random variables of the kernel form, 
it follows, by Lemma 7.1, that 

Blk{01 = E[Blk{e*)]+Op{\og^'\l/h)/^). 
The conditional expectation of Ylk=i-Bn ki^*) '^^^ calculated as 

j2E[B*n,km\u,^,2] 



k=l 



k=li=l •'^ 

1 / 1 " \ 

V fc=i 1=1 / 

+ Op(logl/2(l//i)/V;^) 
A ^ 
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Then, 

LU9*) = {wuu)}^e*+j2E[B:,km]+o,{iog'/\i/h)/v^) 

k=l 

= {w:(n)}^r + ^i?{i^[i?;,(r)|c/,x,z]} + o,(iogV2(i//,)/v;^) 

k=l 

= {wuu)}^e* + ^{e*fE[sUuW + o,{iog'/\i/h)/V^). 

It can be shown that £'[S* (n)] = fu{u)S*{u) + 0{h?). Therefore, we can 
write Ln{9*) as 

LliG*) = {W:(n)}^r + lE^{e*fS*iu)e* + Op{h^ + log^/\l/h)/V^). 

By applying the convexity lemma [25] and the quadratic approximation 
lemma [4], the minimizer of L^{0*) can be expressed as 

(7.3) 6* = -f^\u){S*{u)}-'Wl{u) + Op(/i2 + \og^l\l/h)/^/^), 
which holds uniformly for u^Vt. Meanwhile, for any point u G 17, we have 

(7.4) ~e* = -f^\u){S*{u)r'Wl{u) + o,(l). 

Note that S*(u) = 6ia,g{Di{u),c^2^2{u)} is a quasi-diagonal matrix. So, 
/ao - c>lq{u)\ 

(7.5) k-cx{u) =-/^H^)Dr'(^)W;i(n) + Op(l), 
V ^-/3o / 

where W^^iu) = ^ ELi ELi WCe^, Xf , Zf )^. Let 

q n 

W*,{u) = ^^^K,(n)r?,fc(e[,Xf ,Zf)^. 

Note that 

By some calculations, we have that E[W^ ^{u)] = and Var[W^-^(n)] — t- 
fu{u)i'Q'Si{u). By the Cramer-Wold theorem, the central limit theorem for 
W„^i(n) holds. Therefore, 

W*^{u)^N{0,fu{u)iyoi:i{u)). 
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Moreover, we have Var [W; ^ (n) - W*^{u) | [/, X, Z] < ^ Y:7=i Kf {u) (e^ , X 
Zff{el,X.f, Zf) maxfc{F(cfc + \ri\) - F(cfc)} = Op(l), thus 

So, by Slutsky's theorem, conditioning on {[/, X, Z}, we have 

(7.6) Wl.iu) - EiWl^u)] A iV(0, fuiu)i.o'^iiu)). 

We now calculate the conditional mean of W* ^ (u) : 

^ s[w;i(u)|?7,x,z] 

= E E K^i^nn^k - U{u)) - F{c,)}iei, Xf , Zff 

k=l 1=1 
<? n 

E E + o(l)}(eL Xf , Zf )^ 



(7.7) 



nh 

k=l 1=1 



The proof is completed by combining (7.5), (7.6) and (7.7). □ 

Proof of Theorem 3.2. Let 6 = ^/n{j3 - Then, 

Yi-~aQk{Ui)-^Jei{Ui)-Zj(3 

= ei-Ck- {a^kiUi) - ao{Ui) - Ck} - X.J{a{Ui) - a{U,)} - Zj{(3 - / 

= ei-Ck- fi^k - zf 0/ y/n, 

where fj^fc = {aofc(f/i) - ao(f^j) - Cfc} + Xf {a([/i) - a{Ui)}. Then, 

g n 

^ = argmin^ p,, {Y, - ~aok{Ui) - X.fk{Ui) - Zfp) 

k=l i=l 

is also the minimizer of 

q n 

Ln{0) = ^ '^{Prk -Ck- n^k - zf 0/ Vn) - pr^^ (Sj - Cfc - fj,fc)}. 
fc=l i=l 

By applying the identity (7.2), we can rewrite Ln{6) as follows: 



fc=l i=i V 
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+ / [I{ei <Ck + z)- I{£i < Ck)] dz \ 

where = ^Li Er=i < + z) - /(e^ < c^)] dz. Let 

us now calculate the conditional expectation of Bn{0): 

E[B„(0)|C/,X,Z] 

=Y.J2 1 [zf{ck){i+omdz 

k=l i=l •^'''.fe 

/ q n \ / 1 ^ " \^ 

= 2^^ n E E /(cfc)Z^Zf P - 7^ E E ^ + 

V fc=i j=i / \v"'fc=ij=i / 

Define Rn{0) = Bn{6) - E[Bn{0)\U,X,Z]. It can be shown that Rn{0) = 
Op{l). Hence, 

Ln{e)= ^^^r?,,fcZi e + E[Bn{6)\u,y.,z]+Rn[e) 

= 2^^^"^+(7^EE^^.^^z,j ^-(^EE/(^^>~^^zj 

+ Op{l), 

where S„ = ^ Ylt=i J27=i f {ck)'^i'^f ■ By (7.3), the third term in the previous 
expression can be expressed as 

^ q n 

rp 







y k'=ii'=i \z,, 
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q n 

Jn ^-^ ^-^ 

^ g n. 

= —1= ^ ^ 'ni^kSk{Ui,Xi, Zj) + 0p(i), 

where 

S{U^,Xi, Zi) = E[Z(c^, cX^,0)|C/ = U^]B^\Ui){I„ l^X,, l^Z.f • 
Therefore, 

1 / 1 " \^ 

L„(0) = -0^S„0+ — ^^r?i,fc{Zi-5fc([/„Xi,Z,)} + Op(l) 

\^'"' k=ii=i J 
^ Vs„0 + W^0 + Op(l). 
It can be shown that S„ = E{Sn) + Op(l) = cS + Op(l). Hence, 

Ln{9) = ^e^se + wl6 + opii). 

Since the convex function - W^O converges in probabihty to the 

convex function SO, it follows, by the convexity lemma [25], that the 
quadratic approximation to Ln{0) holds uniformly for 6 in any compact set 
G. Thus, it follows that 

(7.8) e = --s-^Wn + op{i). 

c 

By the Cramer-Wold theorem, the central limit theorem for W„ holds and 
Var(WO ^ A = El=iEl'=irkk'E{Z - 6k{U,X,Z)}{Z - Sk'iU,X,Z)}^ . 
Therefore, the asymptotic normality of /3 is followed by 



- /3o) A iV (^0, Is-i AS-i) . 



This completes the proof. □ 

Proof of Theorem 3.3. The asymptotic normality of ceo{u) and a.{u) 
can be obtained by following the ideas in the proof of Theorem 3.1. □ 

Proof of Theorem 4.1. Use the same notation as in the proof of 
Theorem 3.2. Minimizing 

q n d 

^^p.jy^ - «Ofc(C/.) - Xfa(^,) - Zj(3]+nqY,p'x,{\l^f\m\ 

k=l i=l j=l 
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is equivalent to minimizing 



Gn{0) = ^^{/Orfefe - Cfc - fi^k - Zf - pr^{ei -Ck- fi^k)} 

k=l i=l 

d 

+ ngj]4(|/3f |)(|/3,|-|/3o,|) 



d 

^e^^e + wle + nqY,Px,{\(if\)m-\fioj\) + op{i) 



where = y/n{l3 - (3q) and fi^k = {aok{Ui) - ao{Ui) - Ck} + Xf{a(C/j) - 
Q;(C/i)}. Similar to the derivation in the proof of Theorem 5 in Zou and Li 
[34], the third term above can be expressed as 

(7.9) mY,p',^{\pf\m,\ - |/3o,|) ^ if = /320, 



CO, otherwise. 



Therefore, by the epiconvergence results [8, 15], we have ^2'^^ ~^ ^ ^^"^ 
asymptotic results for Pf^^ holds. 

To prove sparsity, we only need to show that (32^^ = with probability 
tending to 1. It suffices to prove that if f3oj = 0, then P{f3f^^ / 0) ^ 0. 
By using the fact that | | < max(T, 1 - r) < 1, if ^^^^ ^ 0, then 

we must have V^p'^^H/sf^l) < ^ EILi Thus, we have P0f^^ / 0) < 
P{^/np'^,{\/3J^^\) < ^Y17=i However, under the assumptions, we have 

Vnp'xj{\l3f^\) 00. Therefore, P0f^^ / 0) ^ 0. This completes the proof. 
□ 
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